- Capturing workflows and improving methods reproducibility (Goodman, Fanelli, and Ioannidis 2016)
- Think like a computer!
- Plan your work; work your plan
- Consistency, standard formats
- Tidy data
- Haven't said anything about "openness"…yet
2017-01-29 09:13:10
done.collecting.data = FALSE
while (!done.collecting.data) {
Collect.sample()
if (collected.sample.n >= planned.sample.n) {
done.collecting.data = TRUE
} else {
done.collecting.data = FALSE
}
}
http://datasci.kitzes.com/lessons/python/reproducible_workflow.html
study-1/
sub-001/
sub-001-measure-a.txt
sub-001-image.jpg
sub-001-demo.csv
sub-001-measure-b.txt
sub-002/
sub-002-measure-a.txt
sub-002-image.jpg
sub-002-demo.csv
sub-002-measure-b.txt
...
sub-00n/
...
study-1/
measure-a/
sub-001-measure-a.txt
...
measure-b/
sub-001-measure-b.txt
...
image/
sub-001-image.jpg
sub-002-image.jpg
...
demo/
sub-001-demo.csv
sub-002-demo.csv
...
study-1/
analysis/
data/
sessions/
2017-01-09-sub-001/
...
aggregate/
study-1-demo-aggregate.csv
study-1-measure-a-aggregate.csv
...
R/
img/
reports/
protocol/
code/
my-experiment.m
materials/
stim-1.jpg
stim-2.jpg
...
pubs/
presentations/
papers/
refs/
grants/
2016/
2017/
irb/
mtgs/
## Create project directory
proj.name = "tmp_proj"
if (!exists(proj.name)) {
dir.create(path = proj.name, recursive = TRUE)
}
# Create sessions directory
sessions.dir = paste(proj.name, "analysis/sessions", sep="/")
if (!exists(sessions.dir)) {
dir.create(path = sessions.dir, recursive = TRUE) # creates intermediate dirs
}
# Aggregate data file directory
agg.dir = paste(proj.name, "analysis/aggregate", sep="/")
if (!exists(agg.dir)) {
dir.create(path = agg.dir, recursive = TRUE)
}
lowerCamelCaseIsGood.txt so is UpperCamelCase.txtunderscores_between_words.txt works; so do dashes-between.txtspaces in your file names.txt; these are not always reliably readable by all computers.subID,sex,ageYrs,favColor 001,m,53,green 002,f,51,blue 003,f,23,red 004,m,25,aqua
Don't put spaces between variables in comma-separated value (.csv) files. Also, make sure to add a final line feed/enter character.
subID,condition,rt 001,upright,250 001,inverted,300 002,upright,225 002,inverted,290 003,upright,270 003,inverted,230 004,upright,210 004,inverted,240
# read data files, first row (header) contains variable names demo <- read.csv(file = "csv/study-1-demo-agg.csv", header = TRUE) rt <- read.csv(file = "csv/study-1-rt-agg.csv", header = TRUE) # merge and print merged <- merge(demo, rt, by = "subID") merged
## subID sex ageYrs favColor condition rt ## 1 1 m 53 green upright 250 ## 2 1 m 53 green inverted 300 ## 3 2 f 51 blue upright 225 ## 4 2 f 51 blue inverted 290 ## 5 3 f 23 red upright 270 ## 6 3 f 23 red inverted 230 ## 7 4 m 25 aqua upright 210 ## 8 4 m 25 aqua inverted 240
Goodman, Steven N., Daniele Fanelli, and John P. A. Ioannidis. 2016. “What Does Research Reproducibility Mean?” Science Translational Medicine 8 (341): 341ps12–341ps12. doi:10.1126/scitranslmed.aaf5027.
Wickham, Hadley. 2014. “Tidy Data.” Journal of Statistical Software 59 (10). doi:10.18637/jss.v059.i10.